New drugs and stock market: a machine learning framework for predicting pharma market reaction to clinical trial announcements

Pharmaceutical companies operate in a strictly regulated and highly risky environment in which a single slip can lead to serious financial implications. Accordingly, the announcements of clinical trial results tend to determine the future course of events, hence being closely monitored by the public. Most works focus on retrospective analysis of announcement impact on company stock prices, bypassing the consideration of the problem in the predictive paradigm. In this work, we aim to close this gap by proposing a framework that allows predicting the numerical values of announcement-induced changes in stock prices. In fact, it is a problem of the impact prediction of the specific event on the corresponding time series. Our framework includes a BERT model for extracting the sentiment polarity of announcements, a Temporal Fusion Transformer for forecasting the expected return, a graph convolution network for capturing event relationships, and gradient boosting for predicting the price change. We operate with one of the biggest FDA (the Food and Drug Administration) datasets, consisting of 5436 clinical trial announcements from 681 companies for the years 2018–2022. During the study, we get several significant outcomes and domain-specific insights. Firstly, we obtain statistical evidence for the clinical result promulgation influence on the public pharma market value. Secondly, we witness inherently different patterns of responses to positive and negative announcements, reflected in a stronger and more pronounced reaction to negative clinical news. Thirdly, we discover two factors that play a crucial role in a predictive framework: (1) the drug portfolio size of the company, indicating the greater susceptibility to an announcement in the case of low diversification among drug products and (2) the announcement network effect, manifesting through an increase in predictive power when exploiting interdependencies of events belonging to the same company or nosology. Finally, we prove the viability of the forecast setting by getting ROC AUC scores predominantly greater than 0.7 for the classification of price change on historical data. We emphasize the transferability and generalizability of the developed framework on other datasets and domains but on the condition of the presence of two key entities: events and the associated time series.

. The correspondence between the share of considered trading volume peaks with the peak duration. Redline is the 90% of the total number of peaks.

Other relationships between announcement impact and company characteristics
The dependence of the number of FDA news announcements on the year for the public companies is given in Figure 2. It demonstrates the increase in the number of FDA announcements in open sources by approximately ten times from 2017 to 2021.
Figures 3, 4 and 5 show the dependence of stock price changes caused by announcements on the company age since entering the IPO at the announcement moment, historical volatility, and the company's 30-day stock price trend. Volatility is defined as the relation of the standard deviation of stock price to the median stock price in a 200-day period before the announcement. We can conclude that the younger company is, the more sensitive it is to the announcements. Greater intrinsic company volatility results in more extreme price changes. In addition, if the company trend of a stock price is about zero in the pre-event period, then the induced price change is not significant.  Dependence of the stock price changes on the company age. In the left part, the red dashed lines divide announcements into equal groups (with the same number of announcements inside). In the right part, the red dashed line goes through the median values of each group's stock price changes.

Mismatch analysis between announcement polarities and price change
To realize the mismatch between events sentiment in accordance with FDA announcements and events sentiment in accordance with the actual price change, we build a confusion matrix shown in Figure 6. The sentiment polarity derived from stock price change, namely NCAR 20 , is defined as follows: the event is set to be positive or negative one in accordance with NCAR 20 if its price change is more than σ neut /2 = 11.5% (σ neut -standard deviation of NCAR 20 for neutral announcements) or less than −σ neut /2, respectively. If the price change lies within [−σ neut /2; +σ neut /2], then we define such event as a neutral one in accordance with NCAR 20 . As a result, the highest rates in the confusion matrix constitute 23.7% and 24.1% and relate to the cases in which negative announcements have negative or neutral price changes.

Comparison analysis of models for classification of price change
Solving the problem of price change classification, we test Gradient Boosting (GB) and Random Forest (RF), which are common machine learning models for tabular data. In addition, we experiment with Graph Convolution Network (GCN) in a node classification setting. We explore model combinations, GCN+GB and GCN+RF, with the aim of enhancing final predictive quality. The results are presented in Table1. The best metrics are achieved with GCN+GB.

Statement on computational resources and environmental impact
We used a NVIDIA GeForce RTX 2070 SUPER GPU and NVIDIA A100 80GB PCIe GPU to train the classifiers and the BERT model, correspondingly.
This work contributed 1.34 kg and 6 g of equivalent CO 2 emissions during the classifiers and BERT training, respectively. The carbon emissions information was generated using the open-source library eco2AI (https://github.com/sb-ai-lab/ Eco2AI).